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TEACHERS, LEARNERS AND ORACLES 


ACHILLES A. BEROS AND COLIN DE LA HIGUERA 

Abstract. We exhibit a family of computably enumerable sets which can be learned within 
polynomial resource bounds given access only to a teacher, but which requires exponential 
resources to be learned given access only to a membership oracle. In general, we compare 
the families that can be learned with and without teachers and oracles for four measures of 
efficient learning. 


1. Introduction 

In this paper, we address the question of whether or not the presenee of a teaeher as a 
eomputational aide improves learning. A teaeher is a eomputable maehine that reeeives data 
and seleets a subset of the data. In the models we eonsider, a teaeher reeeives an enumeration 
for a target and passes its data seleetion to the learner - the learner does not have aeeess to 
the original data. The first natural question is if there are families that are leamable with a 
teaeher, but not learnable without. As will be obvious from the definitions presented in the 
next seetion, the answer is no: the learner ean always perform an internal simulation of the 
learner-teaeher interaetion and output the result. The seeond question is whether a teaeher 
can improve efficieney. For teacher models of learning, only the computational activity of 
the learner eounts against the efiieiency bound; the computational activity of the teaeher is 
not eounted. Heuristieally, the question is whether there is benefit to pre-proeessing data. We 
will prove there ean be an exponential improvement in effieieney. In faet, there are situations 
where access to a teaeher is better than aeeess to a membership oraele about the target. 

Various forms of and questions related to teaehing have arisen in learning theory over the 
last few decades. Work on the eomplexity of teaching families has given rise to the elassical 
teaching dimension [7] and more recently the reeursive teaching dimension [12, 4]. In [4], 
Zilles et al. establish deep and interesting eonneetions between reeursive teaehing dimension, 
Vapnik-Chervonenkis dimension and sample eompression sehemes (see [5] for more about 
sample compression). Query learning has been a central topie in learning theory for even 
longer than teaching. Numerous papers have been written both on the abilities of machines 
equipped with oracles to learn [10, 1, 2] and on the properties of oraeles that allow learning 
of eertain target families [11, 8, 6, 9]. 

We add to the body of researeh on teaehing and query learning by eomparing the effieieney 
of the two learning modes. 


2. Background 

We will examine variants of Gold-style text learning of effectively deseribable sets of nat¬ 
ural numbers. In particular, the target objeets will be computably enumerable sets. 
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Definition 2.1. A set, 5, is computably enumerable (c.e.) if there is a partial computable 
function, /, such that S = dom(/). A sequence of sets, {A„}„gN is called uniformly com¬ 
putably enumerable (u.c.e.) if the set [{a,i) : a G A/} is c.e. We also call u.c.e. sequences of 
sets indexed families and call n an index for A„. Note that in an indexed family a set may 
have multiple indices if the sequence {A„}„gN has multiple instances of the same set. For 
notational convenience, we regard indexed familes both as sequences and as sets and write 
A e JA meaning (3n)(A = A„). 

We now remind the reader of some standard notation and concepts as well as introducing 
some notation specific to this paper. 

(1) 0 denotes an acceptable universal Turing machine and hence, a partial computable 

function, fe s{x) is the state or value of the function described by the program coded 
bye e N after s computation stages on input x. If the program execution has termi¬ 
nated, we write otherwise we write feAA T- 

(2) We is the c.e. set coded by the program e as the domain of fe- {W^eleeN is a u.c.e. se¬ 
quence of sets and enumerates all the c.e. sets. We write 6 for the set of all c.e. sets. 

(3) For n G N, {xo,xi ,..., a:„) : ^ N is a polynomial-time computable encoding func¬ 

tion such that Xi < {xo,xi,. ..,x„) for all i < n. We also define a polynomial-time com¬ 
putable decoding function (x)„ : N —> N" which is the inverse function of encoding 
function {xo,xi,...,Xn-i). We define A(8)5 = {{a,b): (aeA)A(b e 5)}. We use (8) 
to partition N into an infinite number of infinite computable sets, N (8) {0},N(8) {!},.... 

Sets of this form are known as columns, whereby N ® {/} is the column of N. As 
a shorthand, we will represent the /'^-column of N with the symbol C,- and the A- 
column of A c N by CfA). Associated with Ci, we define c/ to be a computable 
function such that Wci(x) = WxCi Ci. 

(4) We write {xQ,xi,...,Xn) to denote the ordered tuple of elements (as opposed to the 
encoding of the ordered tuple, {xo,xi,. ..,x„)). 

(5) We fix an encoding of polynomials as natural numbers and write p* to denote the 
encoding of a polynomial p. The encoding is polynomial-time computable, as is the 
decoding, and maps onto N. 

(6) signedInt : N ^ Z is the computable bijection such that siGNEDlNT(2n) = n and siGNEDlNT(2n 
1 ) = -(« + !). 

(7) If a is a string or natural number, then a' denotes the string which consists of a re¬ 
peated i times. 

(8) For function composition we use the notation fog where (/ o g)(A:) = f(g(x)). 

(9) If cr = ao ■ ■ ■ On is a string, then |cr| = n -i-1 is the length of the string, cr(k) = ak and 
content(cr) = [a{k): k < |cr|}. 

(10) An enumeration of a non-empty set A is an infinite sequence of elements of A such 
that every element of A appears in the sequence at least once. We regard an enumer¬ 
ation as a stream of bits with markers between individual elements. We will restrict 
our attention to non-empty sets. Consequently, we need not consider enumerations of 
the empty set. 

(11) A learning machine (or learner) is a partial computable function that receives a string 
as input, may have access to oracle queries and outputs a natural number that is in¬ 
terpreted as a code for a set. The outputs are called hypotheses and the sequence of 
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hypotheses produeed by a learner on initial segments of an enumeration is ealled the 
hypothesis stream. When measuring effieieney, we allow a learner to skip an element 
of an enumeration for some fixed eomputational eost. 

(12) Given an interval [0,n], where n is unknown, but bounded by a™, n ean be determined 
with (m +1)"'*'^ or fewer oraele queries using the following algorithm. First, determine 
the least ko sueh that ^ [0,n]. We will obtain ko after at most m + 1 queries. 
Next, we repeat the proeess to determine the least ki sueh that ^ [a^°,n]. 

By iterating this proeess at most a + 1 times we find n. We eall this an exponential 
query seareh algorithm. 

We will eonsider learning models using eombinations of three different data sourees: enu¬ 
meration, oraele and teaeher. All of the models we eonsider are forms of TxtEx-learning, or 
learning in the limit. We begin with the definition of this fundamental learning model. 

Definition 2.2. Let M be a eomputable learning maehine, an indexed family 

and {a„}„eN an enumeration (text) of a set L 6 !F. 

(1) M TxtEx-identifies {a„}„eN if 

(3/)(V7) {M{ao ... ai+j) = M(ao ... a,-) A F M{aa...ai) = F) 

If only the first eondition above is met, i.e., (3/)(V j){M{aQ...aj+i) = M(ao...ai)), then 
we say that M has converged on the enumeration {a„}„eN. 

(2) M TxtEx-leams F if M TxtEx-identifies every enumeration of F. 

(3) M TxtEx-learns if M TxtEx-leams every F e'F. 

All of the models we examine in this paper are variants of TxtEx-learning. The parameters 
we will vary are linked to sourees of information and the measurement of effieieney. We state 
definitions of these variants starting from an arbitrary learning model. 

Definition 2.3. Let L-leaming be an arbitrary learning model. 

(1) We say that T is L-learnable with a membership oracle (denoted L[0]-leamable) 
if there is a learning maehine, M, that L-leams "F and has aeeess to a membership 
oraele for the target it is learning. As membership oraeles are the only oraeles we will 
eonsider, we often simply refer to a membership oraele as an oraele. 

(2) A funetion T : 2^^ ^ 2^^ is a teacher if it is a eomputable funetion, T(cr) is a prefix 
of T(t) whenever cr is a prefix of r, and oontent(r(cr)) c content(cr). We say that 
F is L-learnable with a teacher (denoted L[T]-leamable) if there is a leamer-teaeher 
pair (M, T) sueh that M L-identifies every enumeration of the form To/, where / 
enumerates a member of F. 

(3) We say that F is L-learnable with a teacher and a membership oracle (denoted 

L[T,0]-learnable) if there is a leamer-teaeher pair, sueh that M has aeeess 

to a membership oraele, T has aeeess to the query responses M reeeives, and M L- 
identifies every enumeration of the form To/, where / enumerates a member of 
F. 

As is elear from the definition, the teaeher serves to pre-proeess the text input before pass¬ 
ing the elements deemed important to the learner. In the subsequent seetions, we will eonsider 
the different eombinations of teaeher and oraele with eertain variants of TxtEx-learning. 
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When defining efficiency notions for learning, the first natural notion is that of polynomial 
run-time: the learner must converge within p{e) computation steps, where p is a polynomial 
and e is a code for the target. There are two problems with this definition. First, apart from 
trivial cases, any learning process can be delayed arbitrarily by using an enumeration that 
repeats a single element of the target set. Second, if a learning machine has produced an 
encoding of the target, but has failed to do so in polynomial run-time, a suitably larger and 
equivalent encoding can be chosen instead so that the run-time is appropriately bounded. As 
we are considering indexed families, rather than general classes of c.e. sets, we can address 
the second problem by fixing a reference index for every set in the family against which 
efficiency is measured. 

Definition 2.4. Let = {A„}„6 n be an indexed family. We define the minimal index of A 6 ^ 
(symbolically, mi^(A) to be the least n such that An = A. 

By restricting our attention to indexed families, we have a well-defined concept of polyno¬ 
mial bounds in the size of the target that is independent of the underlying numbering of the 
c.e. sets, thereby addressing the second problem. In the absence of an oracle or teacher the 
first problem remains. Nevertheless, we include polynomial run-time among the notions of 
efficiency that we define below as it is reasonable when an oracle or teacher is present. 

We will address four measures of learning efficiency: Polynomial run-time, polynomial 
size dataset, polynomial size characteristic sample, and polynomial mind-changes. 

We have also proved [3] the results presented in this paper for general classes of c.e. sets 
equipped with an indexing function. Nevertheless, in this paper we restrict our attention to 
the limited case of indexed families as it is a more familiar context than the indexed target 
families required by the general case. In that more general case, the indexing function selects 
a unique code from the underlying numbering for each set in the class. The codes output by 
the indexing function are taken as the reference against which efficiency is computed. 

3. Polynomial Run-Time 

Definition 3.1. An indexed family = {F’nl/ieN is polynomial run-time learnable (PRT- 
learnable) if there is a machine M and a polynomial p such that for every enumeration / 
of F 6 !F, the learner M converges to a correct index on / in fewer than p{Mifr{F)) computa¬ 
tion steps. If an oracle is accessed, oracle use must also be bounded by p{Mi<p{F)). We use to 
PRT to denote the set of all PRT-learnable indexed families. 

We will apply Definition 2.3 to Definition 3.1 to obtain, for example, PRT[T]-leaming and 
PRT[T], the PRT[T]-learnable indexed families. 

Proposition 3.2 demonstrates that PRT-leamability is much too restrictive in the absence 
of an oracle or teacher. 

Proposition 3.2. Let T' be an indexed family. If there are A,B such that A B and 
ALiB then is not PRT-learnable. 

Proof. Let A, B and F' be as in the statement, let M be an arbitrary learning machine and 
p an arbitrary increasing polynomial. Also, let a = mi^(A), b = mi^(5) and let x e A Pi 5. 
Define fA to be an enumeration of A that begins with and /g be an enumeration of 

B that begins with If M PRT-identifies fA, then M(cr) must be a code for A for any 
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cr = for i > 0. Similarly, if M PRT-identifies fg, then M{cr) must be a eode for B for 

any cr = for i > 0. Thus, no maehine ean PRT identify both and fg and is not 

PRT-leamable. 

□ 

On the other hand, there are many non-trivial indexed families whieh are PRT[0]-, PRT[T]- 
or PRT[T,0]-leamable. 

Example 3.3. Define = [n,oo), G(m,n) = [m,n] and = eontent((n),t)- The indexed fami¬ 
lies T = {FJnen, 0 = {0„}„eN and Fik = for e N are PRT[0]-learnable. 

Example 3.4. The indexed families in Example 3.3 are also PRT[T]-leamable. For example, 
eonsider Fik for some fixed k. Define a teaeher T sueh that T{ao---a„) = T{ao- • •«„-!)«„ if 
an i {ao,...,an-i} and outputs r(ao---a„-i) otherwise. Define a learner M that waits until it 
has reeeived k+l distinet numbers, {bo,...bk}, from T and then outputs {bo,...bk). {M,T) 
PRT[T]-leams Fik- 

Proposition 3.5. PRT c PRT[0] c PRT[T, O] and PRT c PRT[T] c PRT[T, O]. 

Proof. Let ‘K 2 be as above. As observed in Examples 3.3 and 3.4, *7^2 is PRT[0]-leamable 
and PRT[T]-leamable, but by Proposition 3.2, p{2 is not PRT-leamable. Thus, PRT c PRT{0] n 
PRT\T]. The other eontainments follow from the definitions. 

□ 

We now produce indexed families that distinguish PRT[T]-leaming from PRT[0]-learning 
and PRT[T,0]-leaming from both PRT[0]- and PRT[T]-learning. In order to prove that all of 
these distinctions are non-trivial, we introduce the concept of marked self-description. 

3.1. Marked Self-Describing Sets. Including self-description in an object is an encoding 
technique on which many important learning theory examples are based. Examples of self¬ 
description include the self-describing sets SD = {A 6 £ : Wmin{A) = A}, and the almost self¬ 
describing functions FIST) = {/ : 0/(o) =* /}. Many variants on the self-description theme 
have been explored in learning theory and inductive inference. 

Our interest is in families that use carefully engineered self-description to calibrate the 
difficulty in identifying their members. We will construct families whose members are not 
only self-describing, but also have their self-describing elements marked for ease of identifi¬ 
cation. We say that such families exhibit marked self-description. In particular, we will use 
encapsulating objects that we call descriptors. 

Definition 3.6. For finite X c N, a descriptor on the -column is a finite set D = {{x,Cx, l,i): 
xeX} c C,(Ci) such that 

(1) Ex6xSignedInt(cx) = 0 

(2) (VA' c X) ( JfxeX ' signedInt(c;,) O) 

(3) JfxeX ' signedInt(x) > 0. 

Such a descriptor is said to describe the natural number n = ^^g^siGNEDlNT(x). For 
{x, Cx, 1, f) £ D, we call Cx the completion index of the element. For n 6 N, we define DESCRiPXORsdn) 
to be the set of all descriptors on the f^^-column that describe n. 
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A descriptor can be thought of as a stream of data that includes parity bits to check the 
integrity of the data stream and where the intended message is the number described by the 
descriptor. Thus, a machine can decide not only which elements are pieces of the descrip¬ 
tor (packets in the stream), but also decide when the entire descriptor has appeared in the 
enumeration (all the packets have been received). By using a descriptor to encode the self¬ 
description for a set, we make the self-description instantly recognizable upon appearance in 
the enumeration. For this reason, learning such a self-describing set can be achieved with 
no mind-changes. In contrast to the degree to which we have made learning easier, we have 
potentially made efficient learning harder. By distributing the self-description into a large 
descriptor, we will create a scenario in which a very large amount of data is required to reach 
a correct decision. We now proceed to our first result using these tools. 

Lemma 3.7. There is an indexed family {Fn}nen, where Fn describes n, which is PRT[T]- 
learnable, but not PRT[0]-learnable. We call this indexed family the marked self-describing 
sets and designate it by A\SD. 

Proof Fix n G N and let learning machine M and polynomial p be such that n = {m,p*,i), 
where = M and i G {0,1}. Without loss of generality, we may assume that p is increas¬ 
ing. Consider the situation where M has access to the membership oracle for the singleton 
1(0,1,1,0)} and define a computable function q such that, for ^ G N, qif) is the greatest number 
about which M queries the oracle when it receives inputs which are substrings of (0,1,1,0)^. 
Note that q is an increasing function. Define Fn to be a member of DESCRiPTORSo(n) such that 

(1) F„ n [Q,q{p{{m,p*, 1)))] = {(0,1,1,0)} 

and chosen according to a fixed algorithm so that MSlD = {F„}„gN is u.c.e. 

First, we show that A\SD is PRT[T]-leamable. We define a teacher T as follows. If 
content(cr) is not a descriptor, T{a) is the empty string. If D = content(cr) describes n, then 
r(cr) = min(D)' if |cr| = |cro| -l- i where ctq is the shortest intial segment of cr whose content 
contains D and i < n; if i = n then T(cr) = min(D)". Having output min(D) n times, T proceeds 
by enumerating D in decreasing order. Let M be a machine that reads the output of T and 
returns the number of elements in the output of T. The teacher-learner pair learns MS ID and 
the run-time of the learner is linear in the index of the target. 

We now show that MSD is not PRT[0]-leamable. To prove that MSlD is not PRT[0]- 
leamable, fix a learner M = (l)m,wi\ increasing polynomial p encoded by p*,nQ = {m,p*,0) and 
ni = {m,p*, 1). If M PRT[0]-learns MSD with polynomial bound p, then it must succeed at 
identifying and F„j within p(ni) > pino) computation stages. Choose Tq and Ti to be any 
enumerations of and F„j, respectively, which have (0,1,1,0)^*^"'^ as an initial segment. 
When trying to identify Tq and Ti, the learner must reach its final hypothesis before finding 
any elements of the target sets, and other than (0,1,1,0). Whatever hypothesis M 
converges to before completing the p(ni) length initial segment of either enumeration cannot 
code both sets. Thus, M fails to learn at least one of the two sets. Since M and p were chosen 
arbitrarily we conclude that MSD is not PRT[0]-learnable. 

□ 

Lemma 3.8. If = {F„}„e¥t is an indexed family, p a polynomial and there are indices 
a,bo,bi,. ..bp(a)-i such that F^q c Fb^ c ... c Fp(a)-i c Fa, then is not PRT[T]-learnable 
with polynomial bound p. 
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Proof. Let !F, p, a and bQ,...,bp(a)-i be as in the statement and let {M,T) be an arbitrary 
learner-teacher pair. Let ctq be an initial segment of an enumeration of Fb^ on which MoT 
outputs an index for (if no such cro exists, then {M,T) has already failed to learn T). 
Given (T„, an initial segment of an enumeration of Fb„ for n < p(a) - 1, define cr„+i to be an 
initial segment of an enumeration of Fb„^y extending cr„ on which {M, T) outputs an index 
for . Again, if no such extension can be found, then M has failed to learn Let T be 
an enumeration of Fa which has o-p(a)-\ as an initial segment. Since M changes hypothesis 
at least p{a) times on a'p(a)_i, either M fails to identify T or the runtime of the learner cannot 
be bounded by p{a). 

□ 

Lemma 3.9. There is a PRT[0]-learnable indexed family that is not PRT[T]-learnable. We 
call this indexed family the column self-describing sets and designate it by CSD. 

Proof Define 

n-\ 

(2) On =n + \ + y^pi{ai), 

i=0 

where p, is the polynomial such that p* = i. Fix n e N and define A„ = [0,a„] ^ipnicin)} 
^ +«'] ® {f} and Bn,i = + 2'] ® U), for i < p{an). Finally, define Fn = Ai 

if n = Oi and Fn = Bij if n = ai -l- j where j < pfai). Let CSD = {Fn}n€H- To PRT[0]-learn 
CSD, define M to be a learning machine that uses the exponential query search algorithm 
to find the highest index non-empty column, queries about the members of the column, in 
increasing order, until the greatest element is found, and returns the value of this element. 
Since the number of queries involved is polynomially bounded in e, M witnesses the desired 
learnability. 

Since Bn,o c c • • • c Bn,p(a„)-i c An, for each polynomial, p, there is a subfamily of T 
that cannot be PRT[T]-leamed with efficiency bound p. Thus, T is not PRT[T]-learnable. 

□ 

Finally, we wish to distinguish PRT[T,0]-leaming from both PRT[T]-learning and PRT[0]- 
learning. 

Lemma 3.10. There is an indexed family which is PRT[T,0]-learnable, but neither PRT[T]- 
learnable nor PRT[0]-learnable. 

Proof To prove the claim, we must combine the strategies used in the proofs of Lemma 3.7 
and Lemma 3.9. Define exactly as the members of A\SD are defined except we modify 
formula (1) to be 

Fn n [0,^(p(3<m,/, 1)))] = {<0,1,1,0)}. 

We also define Gn exactly as the members of CSD are defined except that we replace formula 
(2) by 

n—l 

an = 3{n + l) + '^ Pi{3ai) 
i=0 
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Finally, we define "K = {Hn}neN where 



G; if n = 2i 
Fj if n = 2i + 1 


We will show that Fi = {Hn}n€N is PRT[T,0]-learnable, but neither PRT[T]-leamable nor 
PRT[0]-learnable. To PRT[T,0]-leam FI, let M be a learner whieh first determines if the 
target set contains 0 using an oracle query. If the target does, then M proceeds as the PRT[0]- 
leamer in the proof of Lemma 3.9, multiplying the hypotheses output by that learner by 2. If 
the target does not contain 0, then M proceeds as the PRT[T]-leamer in the proof of Lemma 
3.7, multiplying the hypotheses output by that learner by 2 and adding 1. M PRT[T,0]-learns 
F{ with only a linear decrease in efficiency compared to the two learners from the previous 
Lemmas. 

To see that F( is neither PRT[0]-learnable nor PRT[T]-leamable, observe that the proofs of 
Lemmas 3.7 and 3.9 suffice to show that FI contains two indexed subfamilies, one of which 
fails to be PRT[0]-leamable and the other fails to be PRT[T]-leamable. 


□ 


For clarity, we summarize the results of Section 3 in the following theorem. 

Theorem 3.11. 

(1) PRT c PRT[0] c PRT[T, O], 

(2) PRT c PRT[T] c PRT[T,0], 

(3) PRT[0]\PRT[T]i^(d, 

(4) PRT[T]\PRT[0]4(D. 

Proof. All of the claims in the statement follow from Lemmas 3.7, 3.9 and 3.10 and Proposi¬ 


tion 3.5. 


□ 


4. Polynomial Size Dataset 


Definition 4.1. An indexed family, F' = {Fn}nen, is polynomial size dataset learnable (PSD- 
learnable) if there is a machine M and a polynomial p such that for any enumeration / of 
F eF ', M converges to a correct index on an initial segment / \ n such that |{/(a) : x<n}\< 
p { Mi < jr { F )). If an oracle is accessed, oracle use must also be bounded by p { Mi < jr { F )). 

Note that oracle use bounds both the queries to which the oracle reponds in the positive 
and those to which it responds in the negative. We shall apply Definition 2.3 to Definition 4.1 
much as we did in the case of Definition 3.1. 


Proposition 4.2. PSD c PSD[0] Q PSD[T, O] and PSD c P5D[r] c PSD[T, G]. 

Proof. The claim follows from the definitions of PSD, PSD[T], PSD[0] and PSD[T,0]. 


□ 


Unlike PRT-learning, there are non-trivial PSD-learnable indexed families. 

Example 4.3. Let F^ be an indexed family containing all the finite sets such that Miyr(F’) = 
{\F\,e), where e is the canonical code for F . F^ is PSD-learnable by the learning machine M 
where A/(ao •••an) = (|content(ao ■■■a„)\,a), where a is the canonical code content(ao 
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Example 4.4. Let F„ = [0,2"]. T = {Fn}neN is PSD[0]-learnable by a learning maehine 
that uses the exponential query seareh algorithm to find the greatest element. 'F is PSD[T]- 
learnable by the pair (M, T) where M{a^Q ,..., ) = ko and T(ao--- ak+i) = aQaiak+i if 

2" < maxjao,.. ■,ak+i} < 2"'^^ and maxjao, ...,ak}<2". F is not PSD-leamable as the learner 
may be foreed to reeeive 2"“^ distinet elements before eonverging to a correct hypothesis. 

Lemma 4.5. There is an indexed family which is PSD[T]-learnable, butnot PSD[0]-learnable. 

Proof. We prove the claim using a strategy similar to that used in the proof of Lemma 3.7. 
Following the notation established in the proof, the only differences are that we define q{€) 
to be the maximum number about which M queries the oracle when it receives inputs which 
are substrings <0,1,1,0)<2,1,1,0)■ ■-<2^, 1,1,0), given the oracle for {{2/, 1,1,0): i < /’}, and 
that we define F„ to be a member of DESCRiPTORSo(n) such that 

Fn n [0,q{p{{m,p*, 1)))] = {(2z, 1,1,0): z < pi{m,p*, 1))}. 

Let F = {Fn}n€N- The proof that F is PSD[T]-leamable is exactly the same as the proof that 
MSD is PRT[T]-leamable. That F is not PSD[0]-leamable follows from the observation 
that for abitrary M = fm, M cannot distinguish between F(^m,p*,Q) and Fi^m,p*,\) on increasing 
enumerations without receiving more than p{{m,p*,\)) elements of an enumeration of the 
target. 

□ 

Theorem 4.6. 

(1) PSD c PSD[0] c PSDIT O], 

(2) PSD c PSD[T] c PSD[T, O], 

(3) PSD[0]\PSD[T]i^Q, 

(4) PSD[T]\PSD[0]4(d. 

Proof By Lemma 4.5 and Example 4.4, we need only prove that PSD[0] \ PSD[T] 0 and 
PSD[T] UPSD[0] c PSD[T,0]. 

Observe that the proof of Lemma 3.8 demonstrates that an indexed family meeting the 
hypotheses of the lemma is not PSD[T]-learnable. Thus, Lemma 3.9 proves that CSF) e 
PSD[0]\PSD[T]. 

Following the proof of Lemma 3.10, merging the families constructed in Lemmas 4.5 and 
3.9 with suitable modifications produces an indexed family which is PSD[T,0]-learnable, but 
neither PSD[T]-leamable nor PSD[0]-leamable. 

□ 

It follows from the definitions that PRT c PSD, PRT[0] c PSD[0], PRT[T] c PSD[T] 
and PRT[T,0] c PSD[T,0]. With the following theorem, we show that all three of these 
containments are strict. 

Theorem 4.7. PSD \ PRT[T, 

Proof. Let K denote the halting problem. Define F = ...} where 

• Fii+i = {2z} if z ^ ^ and F 2 i+\ = {2z,2z + 1} if z 6 .ST. 

• F^2i = {2z,2z + 1}. 

'yk 

• For all even numbers 2z ^ 2 for some k, F 2 i = 0. 
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That T is PSD-learnable is witnessed by the learner M which outputs 6 on the empty string, 
outputs 2/ + 1 on a string with one unique element which is 2/ and outputs T? and any other 
string, if the string either contains H or 2/ + 1. On the other hand, suppose that the learner- 
teacher pair, iN,T), PRT[T]-leams with polynomial bound p. Computing and returning 
2^' cannot be done within p(2i -l-1) computation steps for more than finitely many values of 
i; thus, for all but finitely many i, / ^ ^ if and only if (3 j)(N(T(2i (2i + 1)-^)) = 2^*). Since 
this would imply that K is S®, we have arrived at a contradiction and must conclude that 
!F ^ PRT[r]. Observe that the use of an oracle does not facilitate learning in this case and so 
we conclude that PSD \ PRT[T,0] ^ 0. 

□ 


5. Polynomial Mind Changes 

Definition 5.1. An indexed family, = {Fn]neN, is polynomial mind-changes learnable 
(PMC-learnable) if there is a machine M and a polynomial p such that for every enumera¬ 
tion / of F e !F, the hypothesis stream, g, generated by M on / satisfies |{/: g(i) g(/ -i-1)}| < 
P(mi^(F)) and the only one that appears infinitely many times in g is an index of F. If an 
oracle is accessed, oracle use must also be bounded by p(mi^(F)). 

We begin with an example exhibiting three PMC-learnable indexed families. 

Example 5.2. Let be an indexed family containing all the finite sets such that mi^(F) = 
(|F|, e), where e is the canonical code for F. !F is PMC-learnable as witnessed by the learning 
machine M such that M(ao•••ak) = (|content(ao,.. .,ak)\,e), where e is the canonical code for 
the finite set of distinct elements in ao, ...,ak. On any enumeration of a finite set, F, M will 
change its hypothesis at most |F| times. 

Let T = {F„]nen, where F„ = [0,2"]. !F is PMC-learnable. Define M such that M(cr) is a 
code for [0,2^], where 5 is the least integer greater than or equal to log 2 (max(cr)). 

MSD is PMC-learned by a learning machine that waits until a descriptor has appeared in 
the enumeration and then outputs the number the descriptor describes. 

Theorem 5.3. PMC[T] = PMC = PSD[T] and PMC[T,0] = PMC[0]. 

Proof. Fix an arbitrary indexed family T'. If (M,T) PMC[T]-leams F', then MoT PMC- 
leams F'. Since every PMC-learnable indexed family is also PMC [T]-learnable, PMC = 
PMC[T]. Similarly, PMC[T,0] = PMC[0]. 

Suppose (M,T) PSD[T]-learns F' and define M* such that M*(ao---ak+i) = MoT(ao--- 
ak+i) when T(ao---ak+i) T{aQ---aF) and M*(ao• •-ayt+i) = M*{aQ---af), otherwise. Since 
(M,T) PSD[T]-leams !F, the number of distinct elements that T outputs before MoT con¬ 
verges to a correct hypothesis is polynomially bounded, hence M* changes hypothesis a 
polynomially bounded number of times. Thus, F^ G PMC 

Define functions / and g such that ficr) = |cr| and g{n,x) = a", the string x repeated n times. 
Suppose M PMC-leams F^. Define T such that F(cr) = g(M(cr),min(cr)) if M(cr) is different 
from M{t) for all t < a. F(cr) is undefined otherwise. (/, F) PSD[T]-leams F^ because it 
converges to a correct hypothesis after reading a polynomially bounded number of outputs 
from T. 

□ 
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Theorem 5.4. PMC = PMC[T] c PMC[0] = PMC[T,0] 

Proof. The proof of Lemma 3.8 implies that indexed families whieh meet the hypotheses are 
not PMC-learnable. Thus, CSD is PMC[0]-learnable, but not PMC-leamable. By Theorem 
5.3, PMC = PMC[T] and PMC[T] = PMC[T,0]. Henee, the desired claims are true. 

□ 


6. Polynomial Size Characteristic Sample 

Definition 6.1. An indexed family, T', is polynomial size characteristic sample learnable 
(PCS-learnable) if there is a machine M, a polynomial p and a family 'H such that for each 
F 6 !F, there is a corresponding H epl such that \H\ < p{Mif-{F)) and if / is an enumeration 
of F, then M outputs the same encoding of F on every initial segment of / whose content 
includes H. If an oracle is accessed, oracle use must also be bounded by p(mi^(F)). 

Theorem 6.2. 

(1) PCS[0]\PCS[T] 

(2) PCS[T] \ PCS[0]i^d) and 

(3) PCS[T, O] \ {PCS[T] U PCS[0]) ^ 0. 

Proof. Define Go = M, G„ = [0,n] for n > 0, and ^ = {G„}„6 n- is PCS[0]-leamed by 
M, where M(ao ■ • ■ a„) = 0 if the answer to a query about max{ao,...,a„} + 1 is true and is 
a code for [0,max{ao,...,a„}] otherwise. Since any string, cr, can either be extended to an 
enumeration of Go = N or to an enumeration of G„ = [0,n] for any n > max(content(cr)), no 
learner-teacher pair can PCS-learn Thus, we have proved 1. 

Fix k and suppose that k = {n,p*), where p is an increasing polynomial, and let M be 
the learner coded by n. Define to be a c.e. subset of + 1,2^*'''^] that satisfies three 
conditions. 

• 22^+1 + 1 e Fa:. 

. \Ek\=p{2k+\) + \. 

• For any enumeration, /, of [22^+1 -i- 1 , 22 ^+ 2 ]^ if Fyt c / [ i for some i < p(2k -l-1), then 
M(f[j) = 2kfori<j<p(2k+l). 

If no such set exists, let = 0. If + 0, we define a set satisfiying the following 
conditions. 

• \Dk\ = 2p{2k+l) + \. 

• Ffc CDfcC [22'^+!+ 1,22^+2] _ 

• Dk includes the first p{2k+ 1) members of + 1 , 22 ^+ 2 ] about which M queries 
the oracle on a fixed uniformly computable enumeration of E^. 

If Ek = 0, then Dk = 0. To prove 2, define F” = {Fo,Fi,...} where E 2 k = [22*+^ -i- 1 , 22 ^+ 2 ] 
E 2 k+\ = Dkk) {22*+^ - 1 - 1 }. Observe that for any oracle learner, M, and polynomial, p, there is 
a k such that either 

• there is an enumeration of F 2 A;+i on which M converges to 2k or M makes more than 
p{2k + 1) oracle queries, or 

• M does not have a characteristic sample for E 2 k of size at most p{2k). 

Thus, T is not PCS[O]-learnable. On the other hand, consider the learner, M, and teacher, 
T, defined as follows. Once a number of the form -i-1 appears in the enumeration, T 
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outputs + 1. Using k, T then determines a natural number, n, and polynomial, p, sueh 
that k = {n,p). The teaeher outputs no further numbers until the distinet elements of the 
enumeration exeeeds 2p(2k+ 1) + 1. At this point, T outputs 2^^'^^ +2. Simultaneously, T 
ealeulates If is nonempty, then T outputs the least element of + 1,2^^^^^ \Dk if 
it appears in the enumeration. M returns 2k + I if T has output only one element and returns 
2k if T has output two or more elements. The learner-teaeher pair PCS[T]-learns !F, proving 
2. 

We prove 3 by eombining the two families defined above into one family: define "H = 
{Go<S>{0 },Fq( 2){l},Gi (2){0},Fi (g){!},...}. Were 9k PCS[0]-learnable, that would imply that 
is PCS[0]-leamable; similarly, if “H were PCS[T]-learnable then Q would also be PCS[T]- 
learnable. That 99 is PCS[T,0]-leamable is witnessed by a learner-teaeher pair (with aeeess 
to an oraele) that first waits to see whether the enumeration eontains elements of the form 
(n,0) or (n, 1) and applies the appropriate learning algorithm as defined above. 

□ 

The final theorem of this paper illustrates some of the relationships between PCS-learning 
and the other three types of polynomial-bounded learning. 

Theorem 6.3. 

(1) PMCXPCSi^d). 

(2) PSD G PCS. 

(3) PCS\PMCi^Q). 

Proof. Observe that the family, !F, defined in the proof of Theorem 6.2 is also PMC-learnable. 
Consider a learner, M, whieh attempts to eompute and returns 0 until a number of the form 
22 L+I 2 appears in the enumeration. If this is the only number in the enumeration, then M 

outputs 2k+\. M also outputs 2fc-l-1 if M sueeeeds in computing and every element of 
the enumeration is in Dk U {2^^^^^ -i-1}. Otherwise, M outputs 2k. Since i PCS, we have 
proved 1 . 

We prove 2 in two parts. First, suppose that M PSD-learns a family Q = (Go,Gi,...} with 
polynomial bound p. Let C, denote the first at most p{i) elements of Gi. Define M* such that 
= M{t), where t lists the distinct elements of cr in increasing order. Since M must 
PSD-leam Gi on the increasing enumeration, Ci must be a characteristic sample for M* on 
Gi. Thus, PSD c PCS. Now, consider = {Ao,Ai,...} where A„ = {«} ©N. Consider the 
string ak consisting of the odd numbers from 1 {.o2k+l. For any member of 2R, there is an 
enumeration that begins with ak- Consequently, 2R is not PSD-leamable. Conversely, each 
member of 2R has a characteristic sample of size 1. We conclude that PSD c PCS. 

Given n eN, there are unique and such that n = in + 2^" and 1 < z„ < 2*”. Define 
0 = {Go,Gi,...} where G 2 n = {n}®[0,2"] and G2n+i = {^n}®[0,z„]. In order to PCS-learn 

we define a learner M as follows. Let cr be an arbitrary string of natural numbers. If cr 
contains no odd numbers or contains no even numbers, define M(cr) = 0. Otherwise, let 2n 
be the least even number in cr and let 2m -l-1 be the greatest odd number. M{cr) = 2n if m = 2" 
and M{cr) = 2k+l, where k = m + 2", if m 2". Each member of Q has a characteristic sample 
of size 2 for M, thus, M PCS-learns Q. Conversely, suppose that N PMC-learns 0. For each 
n,i, 01 , 02 , ■ ■ ■,Oi-\ and k = i + 2", there is an Oi such that N{2n 1 3“* 5*^2 _ _(2i - (2i + 
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= 2k+l. Thus, for any polynomial there is an n sueh that p(2n) < 2" and an enumeration 
of G 2 n on whieh N outputs 2” different hypotheses. We have proved 3. 

□ 
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